Goto

Collaborating Authors

 Reading


Phase transition on a context-sensitive random language model with short range interactions

Toji, Yuma, Takahashi, Jun, Roychowdhury, Vwani, Miyahara, Hideyuki

arXiv.org Machine Learning

Since the random language model was proposed by E. DeGiuli [Phys. Rev. Lett. 122, 128301], language models have been investigated intensively from the viewpoint of statistical mechanics. Recently, the existence of a Berezinskii--Kosterlitz--Thouless transition was numerically demonstrated in models with long-range interactions between symbols. In statistical mechanics, it has long been known that long-range interactions can induce phase transitions. Therefore, it has remained unclear whether phase transitions observed in language models originate from genuinely linguistic properties that are absent in conventional spin models. In this study, we construct a random language model with short-range interactions and numerically investigate its statistical properties. Our model belongs to the class of context-sensitive grammars in the Chomsky hierarchy and allows explicit reference to contexts. We find that a phase transition occurs even when the model refers only to contexts whose length remains constant with respect to the sentence length. This result indicates that finite-temperature phase transitions in language models are genuinely induced by the intrinsic nature of language, rather than by long-range interactions.


Scorio.jl: A Julia package for ranking stochastic responses

Hariri, Mohsen, Hinczewski, Michael, Chaudhary, Vipin

arXiv.org Machine Learning

Scorio.jl is a Julia package for evaluating and ranking systems from repeated responses to shared tasks. It provides a common tensor-based interface for direct score-based, pairwise, psychometric, voting, graph, and listwise methods, so the same benchmark can be analyzed under multiple ranking assumptions. We describe the package design, position it relative to existing Julia tools, and report pilot experiments on synthetic rank recovery, stability under limited trials, and runtime scaling.


Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

Giulia Luise, Saverio Salzo, Massimiliano Pontil, Carlo Ciliberto

Neural Information Processing Systems

We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation.


Trajectory Alignment: Understanding the Edge of Stability Phenomenon via Bifurcation Theory

Neural Information Processing Systems

Cohen et al. (2021) empirically study the evolution of the largest eigenvalue of the loss Hessian, also known as sharpness, along the gradient descent (GD) trajectory and observe the Edge of Stability (EoS) phenomenon.